智能论文笔记

X-PuDu at SemEval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection

Yaqian Han , Yekun Chai , Shuohuan Wang , Yu Sun , Hongyi Huang , Guanghao Chen , Yitong Xu , Yang Yang

分类：自然语言处理

2022-11-30

Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic.

translated by 谷歌翻译

PVDD: A Practical Video Denoising Dataset with Real-World Dynamic Scenes

Xiaogang Xu , Yitong Yu , Nianjuan Jiang , Jiangbo Lu , Bei Yu , Jiaya Jia

分类：计算机视觉

2022-07-04

为了促进视频降解研究，我们构建了一个引人注目的数据集，即“实用的视频Denoising DataSet”（PVDD），其中包含200个SRGB和RAW格式的嘈杂清洁动态视频对。与由有限运动信息组成的现有数据集相比，PVDD涵盖了具有变化和自然运动的动态场景。与使用主要高斯或泊松分布的数据集不同，以合成SRGB域中的噪声，PVDD通过具有物理意义的传感器噪声模型，然后进行ISP处理，将原始域中的现实噪声合成现实的噪声。此外，基于此数据集，我们提出了一个基于洗牌的实用降解模型，以增强现实世界中SRGB视频的视频DeNoising网络的性能。广泛的实验表明，接受PVDD培训的模型在许多具有挑战性的现实视频上实现了优越的DeNo绩效，而不是在其他现有数据集中训练的模型上。

translated by 谷歌翻译

KPT: Keyword-guided Pre-training for Grounded Dialog Generation

Qi Zhu , Fei Mi , Zheng Zhang , Yasheng Wang , Yitong Li , Xin Jiang , Qun Liu , Xiaoyan Zhu , Minlie Huang

分类：自然语言处理

2022-12-04

Incorporating external knowledge into the response generation process is essential to building more helpful and reliable dialog agents. However, collecting knowledge-grounded conversations is often costly, calling for a better pre-trained model for grounded dialog generation that generalizes well w.r.t. different types of knowledge. In this work, we propose KPT (Keyword-guided Pre-Training), a novel self-supervised pre-training method for grounded dialog generation without relying on extra knowledge annotation. Specifically, we use a pre-trained language model to extract the most uncertain tokens in the dialog as keywords. With these keywords, we construct two kinds of knowledge and pre-train a knowledge-grounded response generation model, aiming at handling two different scenarios: (1) the knowledge should be faithfully grounded; (2) it can be selectively used. For the former, the grounding knowledge consists of keywords extracted from the response. For the latter, the grounding knowledge is additionally augmented with keywords extracted from other utterances in the same dialog. Since the knowledge is extracted from the dialog itself, KPT can be easily performed on a large volume and variety of dialogue data. We considered three data sources (open-domain, task-oriented, conversational QA) with a total of 2.5M dialogues. We conduct extensive experiments on various few-shot knowledge-grounded generation tasks, including grounding on dialog acts, knowledge graphs, persona descriptions, and Wikipedia passages. Our comprehensive experiments and analyses demonstrate that KPT consistently outperforms state-of-the-art methods on these tasks with diverse grounding knowledge.

translated by 谷歌翻译

Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables

Bin Sun , Yitong Li , Fei Mi , Weichao Wang , Yiwei Li , Kan Li

分类：自然语言处理 | 人工智能

2022-12-02

Conditional variational models, using either continuous or discrete latent variables, are powerful for open-domain dialogue response generation. However, previous works show that continuous latent variables tend to reduce the coherence of generated responses. In this paper, we also found that discrete latent variables have difficulty capturing more diverse expressions. To tackle these problems, we combine the merits of both continuous and discrete latent variables and propose a Hybrid Latent Variable (HLV) method. Specifically, HLV constrains the global semantics of responses through discrete latent variables and enriches responses with continuous latent variables. Thus, we diversify the generated responses while maintaining relevance and coherence. In addition, we propose Conditional Hybrid Variational Transformer (CHVT) to construct and to utilize HLV with transformers for dialogue generation. Through fine-grained symbolic-level semantic information and additive Gaussian mixing, we construct the distribution of continuous variables, prompting the generation of diverse expressions. Meanwhile, to maintain the relevance and coherence, the discrete latent variable is optimized by self-separation training. Experimental results on two dialogue generation datasets (DailyDialog and Opensubtitles) show that CHVT is superior to traditional transformer-based variational mechanism w.r.t. diversity, relevance and coherence metrics. Moreover, we also demonstrate the benefit of applying HLV to fine-tuning two pre-trained dialogue models (PLATO and BART-base).

translated by 谷歌翻译

Modeling Complex Dialogue Mappings via Sentence Semantic Segmentation Guided Conditional Variational Auto-Encoder

Bin Sun , Shaoxiong Feng , Yiwei Li , Weichao Wang , Fei Mi , Yitong Li , Kan Li

分类：自然语言处理 | 人工智能

2022-12-01

Complex dialogue mappings (CDM), including one-to-many and many-to-one mappings, tend to make dialogue models generate incoherent or dull responses, and modeling these mappings remains a huge challenge for neural dialogue systems. To alleviate these problems, methods like introducing external information, reconstructing the optimization function, and manipulating data samples are proposed, while they primarily focus on avoiding training with CDM, inevitably weakening the model's ability of understanding CDM in human conversations and limiting further improvements in model performance. This paper proposes a Sentence Semantic \textbf{Seg}mentation guided \textbf{C}onditional \textbf{V}ariational \textbf{A}uto-\textbf{E}ncoder (SegCVAE) method which can model and take advantages of the CDM data. Specifically, to tackle the incoherent problem caused by one-to-many, SegCVAE uses response-related prominent semantics to constrained the latent variable. To mitigate the non-diverse problem brought by many-to-one, SegCVAE segments multiple prominent semantics to enrich the latent variables. Three novel components, Internal Separation, External Guidance, and Semantic Norms, are proposed to achieve SegCVAE. On dialogue generation tasks, both the automatic and human evaluation results show that SegCVAE achieves new state-of-the-art performance.

translated by 谷歌翻译

Knowledge Distillation based Degradation Estimation for Blind Super-Resolution

Bin Xia , Yulun Zhang , Yitong Wang , Yapeng Tian , Wenming Yang , Radu Timofte , Luc Van Gool

分类：计算机视觉

2022-11-30

Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (\eg, blur, noise, jpeg compression) to supervise the degradation estimator training. In addition, these special designs for certain degradation, such as blur, impedes the models from being generalized to handle different degradations. To this end, it is necessary to design an implicit degradation estimator that can extract discriminative degradation representation for all degradations without relying on the supervision of degradation ground-truth. In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network. To learn the KDSR model, we first train a teacher network: KD-IDE$_{T}$. It takes paired HR and LR patches as inputs and is optimized with the SR network jointly. Then, we further train a student network KD-IDE$_{S}$, which only takes LR images as input and learns to extract the same implicit degradation representation (IDR) as KD-IDE$_{T}$. In addition, to fully use extracted IDR, we design a simple, strong, and efficient IDR based dynamic convolution residual block (IDR-DCRB) to build an SR network. We conduct extensive experiments under classic and real-world degradation settings. The results show that KDSR achieves SOTA performance and can generalize to various degradation processes. The source codes and pre-trained models will be released.

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译

Mobile Edge Computing, Metaverse, 6G Wireless Communications, Artificial Intelligence, and Blockchain: Survey and Their Convergence

Yitong Wang , Jun Zhao

分类：人工智能 | 机器学习

2022-09-28

随着物联网（IoT）和5G/6G无线通信的进步，近年来，移动计算的范式已经显着发展，从集中式移动云计算到分布式雾计算和移动边缘计算（MEC）。 MEC将计算密集型任务推向网络的边缘，并将资源尽可能接近端点，以解决有关存储空间，资源优化，计算性能和效率方面的移动设备缺点。与云计算相比，作为分布式和更紧密的基础架构，MEC与其他新兴技术的收敛性，包括元元，6G无线通信，人工智能（AI）和区块链，也解决了网络资源分配的问题，更多的网络负载，更多的网络负载，以及延迟要求。因此，本文研究了用于满足现代应用程序严格要求的计算范例。提供了MEC在移动增强现实（MAR）中的应用程序方案。此外，这项调查提出了基于MEC的元元的动机，并将MEC的应用介绍给了元元。特别强调上述一组技术融合，例如6G具有MEC范式，通过区块链加强MEC等。

translated by 谷歌翻译

Retrieval of surgical phase transitions using reinforcement learning

Yitong Zhang , Sophia Bano , Ann-Sophie Page , Jan Deprest , Danail Stoyanov , Francisco Vasconcelos

分类：计算机视觉 | 人工智能

2022-08-01

在微创手术中，视频分析的手术工作流程分割是一个经过深入研究的主题。常规方法将其定义为多类分类问题，其中各个视频帧被归因于手术期标签。我们引入了一种新颖的加强学习公式，以用于离线相过渡检索。我们没有尝试对每个视频框架进行分类，而是确定每个相转换的时间戳。通过构造，我们的模型不会产生虚假和嘈杂的相变，而是相邻的相位块。我们研究了该模型的两种不同配置。第一个不需要在视频中处理所有帧（在2个不同的应用程序中仅<60％和<20％的帧），而在最新的精度下略微产生结果。第二个配置处理所有视频帧，并以可比的计算成本优于最新技术。 We compare our method against the recent top-performing frame-based approaches TeCNO and Trans-SVNet on the public dataset Cholec80 and also on an in-house dataset of laparoscopic sacrocolpopexy.我们同时执行基于框架的（准确性，精度，召回和F1得分），也可以对我们的算法进行基于事件的（事件比率）评估。

translated by 谷歌翻译

CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN

Matan Atad , Vitalii Dmytrenko , Yitong Li , Xinyue Zhang , Matthias Keicher , Jan Kirschke , Bene Wiestler , Ashkan Khakzar , Nassir Navab

分类：计算机视觉 | 机器学习

2022-07-15

医学图像分析中使用的深度学习模型很容易由于其黑盒性质而引起的可靠性问题。为了阐明这些黑盒模型，先前的作品主要集中在识别输入特征对诊断的贡献，即功能归因。在这项工作中，我们探讨了反事实解释，以确定模型依赖于诊断的模式。具体而言，我们研究了胸部X射线内变化特征对分类器输出的影响，以了解其决策机制。我们利用一种基于样式的方法（StyleEx）来通过操纵其潜在空间中的特定潜在方向来为胸部X射线射线创建反事实解释。此外，我们建议本本芬大大减少生成解释的计算时间。我们在放射科医生的帮助下临床评估反事实解释的相关性。我们的代码公开可用。

translated by 谷歌翻译